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ABSTRACT 

Deforestation is an important emerging factor in the developing countries, this leads to permanent clearing of 
forests which is having major impact on finite resources. Monitoring of forest destruction has to be done with the change of 
time analysis and is necessary to drive the factors of deforestation.lt is essential to identify and understand the causes of 
deforestation. 

Drivers of deforestation can be categorized in different ways like urbanization, agriculture, roads, and mining. 
Categorization can be done by using classification technique in data mining. We analyzed forest cover maps based on 
satellite data, converted to geospatial database to obtain the output. The objective of this paper is to focus on the factors of 
deforestation and their associations in the study area. This paper presents the class association rules for deforestation data 
set which yields high accuracy than general classification and association rules. 

KEYWORDS: Deforestation, Data Mining, Association Rule, Class Association Rule, Classification 
INTRODUCTION 

Due to the increasing volume and complexity of databases, the search for new techniques of data mining has been 
emphasized [9]. Data Mining or Knowledge Discovery is needed to make sense and use of data. Knowledge Discovery in 
Data is thenon-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data 
[7]. Data mining is defined as an information extraction activity whose objective is to discover hidden facts contained in 
databases. Data mining tasks including association rules mining, classification and prediction, as well as cluster analysis 
have been successfully utilized in analyzing spatial data related to forest fires [1 1][16][17][6][23][12]. 

Generally speaking, association rule mining [3] and classification rule mining [21] are most effective and efficient 
techniques in data mining. Association rule mining was originally intended to discover regularities between items in large 
transaction databases [^.Classification rule mining is widely used in predicting class of future objects whose class label is 
not known. Even though there is a lot of difference between association and classification techniques, both association 
rules and classification rules can be represented as if-then type rules 

To discover strongly correlated rules, different measures have been proposed to evaluate the interestingness of 
patterns, such as support and confidence [3]. Association rule mining is one of the techniques that use the concept of 
support and confidence to identify the interesting rules. The use of association rule mining in classification rule was first 
introduced in 1997 byAli et. al, Bayard[4][2] and it was named as class association rule or associative classification. The 
first classifier based on association rules was CBA [18] given by Liu et al. in 1998. 

Class association rule mining process can be categorized in three steps: 

• Finding frequent item sets and frequent class association rules. 
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• Then find the strong class association rules by pruning the weak rules. 

• Design a classifier [21]. 

In the recent trends deforestation is one of the most potent factors for degradation of ecosystem. Differentiation 
between natural and human induced forest changes is a complex task, but must be analyze the underlying causes of 
disturbances in forest change. This can be done by embedding Remote sensing and Geographical Information 
System(GIS). Research using remotely sensed satellite data has attracted attention on image classification, because 
classification results are basis for interpretation, analysis and modeling for various environmental and socio-economic 
applications [13]. Remote sensing is a set of activities to obtain information from objects that constitute the Earth's surface, 
regardless physical contact with them, using satellites [23]. Data mining techniques can be applied for generating the class 
association rules for analyzing the deforestation. In this paper we applied classification association rule technique for our 
data to analyze the classes and association among them. We simulated the class association rules in WEKA data mining 
tool, an open source suite of machine learning algorithms. 

PROBLEM DOMAIN 

Forests are a critical component of the planet's ecosystem. Unfortunately, there has been significant degradation 
in forest cover over recent decades as a result of logging, conversion to crop, mining and urbanization or disasters (natural 
or man-made) such as forest fires, floods, and hurricanes [19]. Substantial attention is being given to the sustainable use of 
forests. A key to effective forest management is to play attention to acquire knowledge about changes in forest and 
identifying the factors of declining forest area. This can be done through integration of GIS, RS and data mining 
techniques. The study area covers the 5000 square kilometers which includes Chittoor, Kadapa and Nellore districts. The 
boundary lies between lower left East 78 " 45" Longitude and E 13"35" Latitude and the upper right corner N 79 ^ 39" 
Longitude and N 14 "33" Latitude with an area of 15,379 square kilometers of Kadapa district, which includes 51 Mandals 
and three Revenue Divisions. The geographical area of Chittoor district lies between 12 : 37"to 14 "18" N Latitude and 78 " 
33"to 79 "55" E Longitude. The district area is 13,076 square kilometers divided into three Revenue Divisions and 46 
Mandals administratively. The data is derived from the Manjula et.al (2011) consisting of maps and tables regarding the 
association technique [20]. The data set consists of 5 attributes and 99 instances. The study area outline map of the district 
is specified in Figure 1 : 



Data Preprocessing 

We have collected 1991, 2001 and 2011 images, from that images we explore the digital data using image 
analysis [20]. The figures 3(a) and (b), 2.2 (a), (b) and (c) denotes toposheets, satellite images of the study area. 
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Figure 1: Outline Map of Study Area 
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(a) (b) (c) 



Figure 3: Study Reference Map:(a) Row Index Map of AP (b) Boundary Map (c) SIGHT of Study Area 
Data Preparation 

Comparing the results of classified images is the best approach for detecting the change in satellite images. The 
comparison of classified-map not only identifies the location but it also shows the nature and change type are determined 
for the study area. Our Primary objective is to define the deforestation by considering only two classes forest and non- 
forest. Figure2.2, 2.3 shows the toposheets, thematic and scanned images by which the base map is generated. Figure2.4 is 
a base map which displays the details of mandals of study area. Then using the approach of ISO Cluster and Maximum- 
Likelihood method we classified the three images which represents three decades. By deriving the classification results, the 
maps are generated as outputs. 



BASE MAP 




Figure 4: Outline Map Showing Mandals of Study Area 
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Figure 5: Classified Raster Image 




Figure 6: Study Area Classified Vector Images of 1991 2001 and 2011 
Table 1: Spatial Predicate for the Year 1991(Singleminel991) 
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Table 2: Spatial Predicate for the Year20011(Singlemine2001) 
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Table.3: Spatial Predicate for the Year2011 (Singlemine2011) 
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Tables 1, 2 and 3 gives the spatial predicates that are derived from the output of classified images to set as input 
for generating classification rules. 

CLASS ASSOCIATION RULE MINING 
Association Rule Mining 

Association Rule Mining is one of the vital approaches in data mining for finding frequent item sets. Association 
rules are capable of revealing all interesting relationships in large databases. An association rule is defined as "Let I = 

{i b i 2 , iml be a set of literals, called items. LetD be a set of transactions (database), where each transaction. T is a set of 

items such that Tc I. TID indicates a uniquetransaction identifier. An association rule is an implicationof the form X— >Y, 
X czl and Yc I and X D Y = <£>. X iscalled antecedent while Y is called the consequence of therule.'This method is well 
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known in data mining and is applied to market analysis by looking for items that arefrequently associated in a commercial 
transaction [7]. It has been extended to deal with spatial data to expressrules like: 

Ai A A 2 ... A A m A Spatial Relations => Bi A ... A B n A Spatial Relations [s, c],where A ; and Bj arepredicates like 
attribute=constant_value , s is the rule support and c the rule confidence. These rules are usedto find associations between 
properties of objects and those of neighboring objects. The rules that satisfy both minimum support and confidence 
threshold are said to be strong association rules. 

For example, the rule : is_a (x, gas_station) A within (x, rural_area) -> close_to (x, highway) [65%, 80%] [4] 
CLASSIFICATIONS 

Classification is one of the important areas of data mining [9]. Classification is a data mining technique where the 
data stored ina database is analyzed in order to find rules that describe the partition of the database into a given set of 
classes. Each object in a database is assumed to belong to a predefined class, as it is determined by one of the attributes, 
called the class label attribute. A number of classification methods were proposed by statistics and machine learning 
researchers [7]. The objective of classification is to predict the class of future objects whose class label is not known. 

Association rules and classification rules are represented as ifthentype rules. However, there are some 
dissimilarityamong them. Association rules are commonly used asdescriptive tools, which provide the association 
relationships tothe specific application experts, whereas classification rulesare used for predicting the unseen testing data. 
However, amajor problem in association rule mining is its complexity. The result of an association rule mining algorithmis 
not the set of all potential relationships, but the set of allinteresting ones. That is an vital issue of the miningprocess, but the 
quality of the resulting rule set is ignored. On the other hand there are approaches to explore thediscriminating power of 
association rules and use themaccording to this to solve a classification problem [26] [5]. 

Class Association Rule Mining 

Associative classification is a recent and novel technique that applies the method of association into classification 
and achieves high classification accuracy. Let Dbe a dataset with Tset of tuples. Each tuple follows the schema (A\,A 2 , . . 
.,A N ,A c ),in which (A h A2, ■ ■ ■ , A N ) are N attributes andA c is the target class. The attributes may be either categorical or 
continuous. For continuous attributes, the value range is discretized into intervals. An attribute-value pair is represented as 
an item. For any two disjoint frequentattribute-value subsets X and Y of A, the patterns of the form X —*Y are called 
association rules, where X and Tare disjoint sets (ie., XCiY=0). Frequent attribute-value sets and then association rules can 
be generated using the popularmethods Apriori[3][8], FP-Growth[10][8] or any other well-known techniques. The 
attribute-value sets X and Tare called antecedent and consequent of the association rule respectively. Class Association 
Rules (CARs) are the association rules with class label attribute as the only consequent. Let A={A 1 ,A 2 ,A 3 ,...,A m ,C} be the 
m+1 distinct attributes and C={cl,c2,...,ct} be the class label attribute with t number of classes. Supposeitem set T£A, A is 
the item set of any items with attributes (A h A2, ■ ■ -,An)> c is 1-itemset of class attribute, a class association rule can be 
represented as 

Here, 7may contain a single item or multiple items. 



Class Association Rule Mining for Analyzing Deforestation Factors 



243 



CAR rule is of the form L— > (C,ci), where the pattern L is the attribute-value pair from the attribute set {A \ C} 
and ci is the class label value for C[KB10]. Generation of class association rules (CARs) is generally controlled by the two 
measurescalled support and confidence, which are given below: 

Support = P(X UY) = P(XY) = (Number of tuples that contains both X and Y) I (Total number of tuples in D) 

Confidence = P(Y I X) = P(X UY) I P(X)= P(XY) I P(X) 

The first algorithm that bring an idea of using an association for classification was the CBA algorithm proposed 
by Liu et.al[18]. The CBA algorithm is 

1 F\ = {large 1 -ruleitems } ; 

2CARI = genRules(Fl); 

3 prCARX = pruneRules(CAfll); 

4 for (k = 2; Fk-\+ $ _ ; k++) do 

5 Ck = candidateGen(Ffc-l); 

6 for each data case d 



Ddo 

7 Cd = ruleSubset(Gfc, d); 

8 for each candidate cECd do 

9 c.condsupCount++; 

10 if J.class = c. class then c.rulesupCount++ 

11 end 

12 end 

13 Fk = {cECk I c.rulesupCount>m/ni«p}; 

14 CARk = genRules(FA:); 

15 prCARk = pruneRules ( CARk) ; 

16 end 

17 CARs = Uk CARk; 

18 prCARs =V_k prCARk; 

Figure 7: The CBA-RG Algorithm 

The function of above algorithm is as follows: In the first pass of the algorithm, it counts the item and class 
occurrences to determine the frequent 1 -ruleitems, from this, a set of CARs is generated by gen Rules. Pruning is also done 
in each subsequent pass to CAR k . Pruning a rule is as follows: If rule r's pessimistic error rate is higher than the pessimistic 
error rate of rule r, then rule is pruned. 
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For each successive pass, say kth pass, the algorithm performs 4 major tasks. First, the frequent rule Fk-l found in 
the (k-l)th pass are used to generate the candidate rule Ck using the condidateGen function(line 5). It then scans the 
database and updates differentsupport counts of the candidates in Ck . Afterthose new frequentrw/es have been identified to 
formFfc , the algorithm then produces the rules CARkusing the genRules function. Lastly, rule pruningis performed on these 
rules. 

The candidategen work is similar to the function Apriori algorithm. The main variance is here it need to increment 
the support counts of the condset and the rule separately, but in Apriori algorithm only one count is updated. The original 
set of rules is in CARs and pruned rules will be in prCARs. 

Mining class association rules can be viewed as a special form of mining association rules, since a set of 
association rules with predefined objectives can be used for classification. The class association rule approach consists of 
two steps: (1) First it implements the famous Apriori algorithm in order to discover frequent item sets. (2) Second step 
involves in building the classifier. 

In the first phase, rule generation, CAR computes the complete set of rules in the form of R:T— >c where is a 
pattern in the data set, and cis a class lab,el such that sup{R}and conf{R}pass the given support and confidence 
thresholds, respectively. Furthermore, CAR prunes rules and only selects a subset of high quality rules for classification. 

In the second phase, classification, CAR extracts a subset of rules matching the itemand predicts the class label of 
the item by analyzing this subset of rules. 

In this paper our analysis is focused on applying the class association rules for our data set. 

EXPERIMENTS AND RESULTS 

From the above Remote Sensing data and tables we are extracted the data for year 1991, 2001 and 2011 and 
developed the deforestation data set for the study area. The sample deforestation data set whichis converted in to arff file is 
given Table 4. 

WEKA is the popular data mining system developed at the University of Waikato. It is an open source machine 
learning environment which consists of useful data mining and machine learning algorithms. In this paper, we 
implemented the association rule-based classification in the WEKA framework. We used our dataset which is given in the 
below Table, it is a sample dataset given which depicts the information about different possibilities of deforestation factors 
of the original dataset. 



Table 4: Sample Arff Data Set 
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Class association rule mining can be done through WEKA machine learning language with the Apriori algorithm 
which is an association rule mining technique in data mining which has an option of car in WEKA. If the option car is 
enabled to true then classification rules are mined as a substitute of common association rules. We define it by the class 
index which is the class attribute. If the class index is set to -1 the last attribute in the data set is taken as class attribute, 
then we get the same consequence of association rules forming class association rules. This method is applied for our data 
set to retrieve the class association rules and we acquired the some of the interesting rules. 

Some interesting Class Association Rules are: 

1 . Agri=yes Built -up=no Mining=no Road=yes 42 
==> Class=AR 42 conf:(l) 

2. Agri=yes Mining=yes Road=yes 17 ==> 
Class=AMR 17 conf:(l) 

3. Built-up=yes Road=no 12 ==> Class=B 12 
conf:(l) 

4. Agri=yes Built -up=yes Road=no 12 ==> 
Class=AB 12 conf:(l) 

5Agri=no Mining=yes Road=yes 12 ==> 
Class=BM 12 conf:(l) 

6. Built-up=yes Mining=yes Road=yes 12 ==> 
Class=BMR 12 conf:(l) 

7. Agri=yes Built-up=yes Mining=yes Road=yes 12 
==> Class= ABMR 1 2 conf: ( 1 ) 

8. Agri=noBuilt-up=no Road=yes 11 ==> Class=R 
11 conf:(l) 

9. Agri=yes Built -up=yes Road=yes 1 1 ==> 
Class=ABRll conf:(l) 

10. Agri=yes Mining=no Road=no 11 ==> Class=A 
11 conf:(l) 

12. Built-up=yes Mining=no Road=yes 11 ==> 
Class=BRll conf:(l) 

13. Agri=yes Built-up=no Mining=no Road=no 11 
==>Class=All conf:(l) 
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Let us Consider the some of the Rules 
Rulel 

This rule represents even built-up and mining is not forest agriculture development leads to roads construction 
then agriculture and roads are the factors of degradation of forest. 

Rule 12 

Represents that sometimes roads are not constructed even agriculture is developed in the forest in such cases only 
agriculture is the factor for deforestation. 

Rule 7 

This rule shows if the development of built -up area may leads to construction of roads there by the built-up and 
roads are considered as the factors of deforestation. In some cases even the urbanization that is built-up area is developed 
the roads are not constructed it is clearly shown in rule 3. 

Rule 8 

According to this rule, we noticed that there is no extension of agriculture land and built-up area but the extension 
of roads may occur which leads of degradation of forest and the factor of deforestation is identified as road. 

CONCLUSIONS 

There are several individual methods are there for achieving classification and association rules. But combining 
the classification and association technique is the novel methodology which applies association into classification 
technique and also yields high accuracy of classification rules. The main objective of this paper is to the class association 
rules for our data set. The experiment is done on our data set to achieve the validate results. We examined two major 
challenges in class association rules: Efficiency in handling large number of association rules and effectiveness in 
predicting new class labels with greater accuracy. 
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